1. Data Overview
1.1 Data Quality Assessment
During the data exploration phase, several data quality issues were identified and addressed:
Loan Tape Dataset Issues:
- Missing Values: The
co_amt_est field (estimated charge-off amount) has approximately 3,994 missing values out of 83,235 loans (4.8% of records). This field was excluded from predictive modeling due to significant missingness.
Loan Performance Dataset Issues:
- Limited Historical Period: Dataset covers performance only through October 2023, providing relatively short observation window for loans originated in 2021-2023. This limits the ability to observe full loan lifecycle performance for longer-term loans and may introduce right-censoring bias in default rate estimates.
- Missing Values: The
charge_off_date field has 988,390 missing values out of 1,045,858 records (94.5%), which is expected as it should only be populated for charged-off loans (57,468 records).
- Outliers Detected: Some loans show
days_delinquent values exceeding 500 days, which appears abnormally high and may indicate data quality issues or exceptional cases requiring investigation.
- Negative Values: Some records show negative values for
paid_principal (minimum: -$105.23) and paid_interest (minimum: -$6.47), likely representing payment adjustments or reversals.
Data Cleaning Actions Taken:
- Converted all date fields to datetime format
- Standardized monetary and percentage fields to numeric types
- Handled missing values appropriately (exclusion from modeling or imputation where suitable)
- Validated loan status transitions and delinquency classifications
- Excluded
WRITTEN_OFF loans (cancelled transactions) from analysis
1.2 Portfolio Composition by Program
|
Loan Count |
Avg Loan Term |
Avg MDR (%) |
Avg Int Rate (%) |
Avg FICO |
Avg Approved Amount ($) |
| P1 |
31,957 |
21.49 |
4.67 |
8.05 |
771 |
5,437 |
| P2 |
32,785 |
21.71 |
5.37 |
18.20 |
696 |
4,278 |
| P3 |
18,493 |
16.12 |
7.62 |
25.32 |
606 |
3,200 |
| Overall |
83,235 |
20.39 |
5.60 |
15.88 |
705 |
4,483 |
1.3 Portfolio Trends & Credit Quality Evolution
Figure 1.2a: Average FICO Score by Vintage Quarter - Shows overall credit quality trend over time
Figure 1.2b: Average FICO Score by Vintage Quarter and Program - Credit quality remains relatively stable within each program
Figure 1.2c: Program Mix by Vintage Quarter - Shows strategic shift toward P1 (prime) originations and away from P3 (subprime)
Key Trends:
- Credit Quality Evolution: Average FICO declined from 2021Q2 through 2022, then improved as program mix shifted toward higher quality
- Program Mix Shift: P1 share increased from ~30% to ~56% of originations since 2023Q2
- P3 Reduction: P3 share decreased from 29% peak (2022Q4) to ~12% in latest quarter
- Strategic Direction: Portfolio demonstrates deliberate shift toward more conservative credit strategy
1.4 Historical Roll Rate Analysis
Figure 1.3: UPB-weighted monthly transition probabilities between delinquency states. Shows the likelihood of loans moving from one state (rows) to another (columns) in the next month.
Key Roll Rate Insights:
- Current Loans (93.5% stay Current): Strong performance with only 4.0% rolling to 1-29 DPD and 2.3% paying off
- Early Delinquency (1-29 DPD):
- 28.6% cure back to Current
- 36.8% remain in 1-29 DPD
- 31.5% roll to 30-59 DPD (deterioration)
- 30-59 DPD: 68.6% roll to 60-89 DPD, showing rapid deterioration once loans reach 30+ days delinquent
- Late Stage Delinquency (90-119 DPD): 80.4% default, with minimal cure or payoff probability
- Severe Delinquency (120+ DPD): 86.8% default rate, effectively terminal state
1.5 Default Rate by Loan Term and Program
Table: Default Rate by Loan Term and Program (%)
| program |
P1 |
P2 |
P3 |
| term |
|
|
|
| 3 |
0.14 |
2.26 |
14.64 |
| 6 |
0.82 |
4.64 |
25.53 |
| 12 |
1.79 |
7.64 |
44.87 |
| 24 |
3.53 |
10.70 |
28.51 |
| 36 |
5.88 |
NaN |
NaN |
Figure 1.4: Default rates by loan term and program. Shows default performance across different loan maturities.
Understanding Term-Based Default Performance:
This analysis shows the cumulative default rate for each loan term by program, based on loans that have reached their terminal state (paid off or defaulted) and had sufficient time to mature by the October 2023 cutoff date. This vintage-complete approach provides unbiased estimates of default risk by loan maturity structure.
Key Observations:
- Program Risk Hierarchy: P3 (subprime) consistently shows the highest default rates across all terms (14-45%), followed by P2 (near-prime, 2-11%), with P1 (prime) showing the lowest default rates (0.14-5.88%)
- Term Length Impact: Shorter terms show lower default risk across all programs
- Risk Concentration: The 12-month P3 segment shows the highest default risk (44.87%), indicating that subprime 12-month loans represent the riskiest segment in the mature portfolio
- Credit Quality Differentiation: Clear risk segmentation by program - P1 maintains excellent performance across all terms, P2 shows moderate credit risk, and P3 demonstrates material credit deterioration
2. Hybrid Transition Model Methodology
2.1 Model Architecture
The analysis employs a hybrid transition model that combines:
- Logistic Regression Models for Current state transitions:
- Current → D1-29 (Early Delinquency): Full feature set with delinquency history
- Current → Payoff (Early Payoff): Simplified categorical features
- Empirical Transition Matrices for delinquent loans:
- Program × Term matrices for D1-29, D30-59, D60-89, D90-119, D120+ states
- Historical roll rates for cure, charge-off, and payoff transitions
2.2 D1-29 Early Delinquency Model
Features (22): FICO score buckets, loan amount buckets, loan term, age buckets,
UPB, payment history, and delinquency history (ever_D30)
Top 5 Feature Coefficients:
fico_fico_740+: -0.7795
age_7-12m: -0.5118
age_4-6m: -0.4819
age_13-18m: -0.3611
age_2-3m: -0.3321
Model Performance: AUC-ROC = 0.7579
2.3 Payoff Model
Features (21): Program dummies, loan term dummies, age buckets,
FICO buckets, and UPB (unpaid principal balance) buckets
Top 5 Feature Coefficients:
upb_upb_0-1k: +3.5162
term_36: +1.0491
upb_upb_7.5k+: -0.8641
term_60: +0.8199
upb_upb_5-7.5k: -0.7789
Model Performance: AUC-ROC = 0.9328
2.4 Empirical Roll Rate Matrices
For delinquent loans (D1-29, D30-59, D60-89, D90-119, D120+), we use historical transition probabilities
stratified by Program × Term. Key roll rates observed:
- D1-29 → Current (Cure): 27.0%
- D1-29 → D30-59: 35.3%
- D90-119 → Charge-off: 79.7%
- D120+ → Charge-off: 86.0%
3. Model Validation & Performance
3.1 Overall Model Fit
Figure 1: Predicted vs Actual rates by loan age for D1-29 and Payoff models (Train and Test sets)
3.2 Performance by Age Bucket (Vintage)
Figure 2: Model performance across different loan age vintages
3.3 Performance by Loan Term
Figure 3: Model performance segmented by loan term (12m, 24m, 36m, 48m, 60m)
Model Validation Summary:
- Models show good calibration between predicted and actual rates across train and test sets
- Performance is consistent across programs (P1, P2, P3) with slight variations by term
- Models capture age-based dynamics: delinquency peaks in early months, payoffs increase near maturity
- No significant signs of overfitting or instability across different segmentations
4. Cashflow Projection & Scenario Analysis
4.1 Methodology
Cashflows are projected month-by-month over a 60-month horizon:
- Portfolio: 8,141 active loans from 2023Q3 origination cohort (July-September 2023), observed as of October 2023
- Vintage Cohort: Focus on most recent origination quarter to eliminate vintage effects and provide forward-looking analysis
- Starting Point: Current UPB (unpaid principal balance) as of October 2023 reporting date
- Projection Horizon: 60 months forward
- Monthly Process:
- Predict delinquency and payoff probabilities using hybrid transition model
- Sample state transitions based on predicted probabilities
- Calculate scheduled payments, prepayments, defaults, and recoveries
- Update loan states and balances for next month
4.2 Key Assumptions
Pricing & Economics:
- Purchase Price: (1 - MDR% + 1%) × Approved Amount
- MDR (Market Discount Rate) varies by loan: average 5.6%
- Spread: 1.0% above par (premium pricing)
- Effective purchase price: ~95.4% of approved amount on average
- Cost of Funding: SOFR + 1.5% = 3.6% + 1.5% = 5.1%
- Leverage: 85% LTV (Loan-to-Value)
- Debt service calculated monthly on outstanding balance
- Recovery Rate: 0% (conservative assumption - actual recovery on charged-off loans)
Model Parameters:
- Transition Probabilities: Predicted using hybrid model (logistic regression for Current state, empirical matrices for delinquent states)
- Stress Multipliers: Applied to D1-29 entry rates and charge-off rates to simulate adverse scenarios
- Amortization: Equal monthly installments based on loan term and interest rate
4.3 Scenario Definitions
| Scenario |
D1-29 Stress |
Charge-off Stress |
Recovery Rate |
Description |
| Base Case |
1.0x |
1.0x |
0% |
Historical transition rates, conservative recovery assumption |
| Moderate Stress |
1.2x |
1.5x |
0% |
20% increase in delinquency entry, 50% increase in charge-offs |
| Severe Stress |
1.6x |
2.5x |
0% |
60% increase in delinquency entry, 150% increase in charge-offs |
4.4 Scenario Results
| Scenario |
investment ($) |
Unlevered IRR |
Unlevered MOIC |
Levered IRR |
Levered MOIC |
Loss Rate |
WAL |
| Base Case |
$31.88M |
5.4% |
1.05x |
5.6% |
1.12x |
8.4% |
0.9y |
| Moderate Stress |
$31.88M |
-3.1% |
0.97x |
-17.1% |
0.60x |
12.1% |
0.9y |
| Severe Stress |
$31.88M |
-10.4% |
0.91x |
-34.5% |
0.18x |
14.9% |
0.8y |
4.5 Cashflow Breakdown by Scenario
Figure 6: Monthly cashflow components (Interest, Principal, Payoff, Default) across scenarios
Cashflow Analysis Key Takeaways:
- Base Case: Attractive returns with 5.4% unlevered IRR and 8.4% loss rate
- Leverage Impact: 85% LTV amplifies returns to 5.6% in base case but increases downside risk
- Stress Performance: Portfolio shows resilience with positive IRR in moderate stress, but severe stress leads to -34.5% levered IRR
- Loss Sensitivity: Charge-off rates are the primary driver of performance variation across scenarios
5. Conclusions & Recommendations
5.1 Model Strengths
- Hybrid Approach: Combines predictive power of machine learning with stability of empirical matrices
- Strong Validation: Models show good out-of-sample performance across multiple dimensions
- Granular Segmentation: Program × Term matrices capture heterogeneity in portfolio behavior
- Interpretability: Feature coefficients provide clear economic intuition (e.g., high FICO reduces delinquency)
5.2 Investment Highlights
- Attractive Risk-Adjusted Returns: Base case unlevered IRR of 5.4% with 0.9y WAL
- Leverage Opportunity: 85% LTV financing enhances equity returns to 5.6% in base case
- Portfolio Quality: Average FICO of 702 with manageable current delinquency levels
- Diversification: Spread across 3 programs and multiple term structures reduces concentration risk
5.3 Risk Considerations
- Credit Deterioration: Moderate stress scenario shows material IRR compression to -3.1% unlevered
- Leverage Risk: High LTV magnifies downside - severe stress results in -34.5% levered return
- Model Risk: Projections based on historical data may not capture unprecedented market conditions
- Concentration: Consumer credit exposure to macroeconomic factors (unemployment, rates, etc.)
5.4 Recommendations
- Proceed with Investment: Base case economics support investment at current pricing
- Monitor Delinquency Triggers: Implement early warning system for D1-29 entry rate increases
- Optimize Leverage: Consider reducing LTV to 70-75% to improve stress performance
- Portfolio Hedging: Evaluate credit protection strategies for tail risk scenarios
- Model Refresh: Update transition matrices quarterly as new data becomes available
Appendix: Technical Details
A.1 Data Sources
- Loan Performance Data: loan_performance_enhanced.csv (1,012,889 observations)
- Observation Period: October 2019 to October 2023
- Universe: 76,669 unique consumer loans
A.2 Model Training
- Train/Test Split: 70% / 30% random split
- Algorithm: Scikit-learn LogisticRegression with L2 regularization
- Feature Engineering: Categorical bucketing, dummy encoding, standardization
- Validation: Out-of-sample AUC-ROC, calibration plots, segmentation analysis
A.3 Software & Tools
- Language: Python 3.12
- Libraries: pandas, numpy, scikit-learn, matplotlib
- Models: hybrid_transition_models.pkl
- Cashflow Engine: Stochastic projection with 60-month horizon (single-path simulation with probabilistic transitions)
A.4 Model Features & Coefficients
D1-29 Early Delinquency Model (22 features - All Categorical)
Feature Categories:
- Program Dummies (2): program_P2, program_P3 (baseline: P1)
- FICO Buckets (4): fico_<620, fico_660-699, fico_700-739, fico_740+ (baseline: fico_620-659)
- Loan Amount Buckets (4): amt_2-4k, amt_4-6k, amt_6-8k, amt_8k+ (baseline: amt_<2k)
- Age Buckets (6): age_2-3m, age_4-6m, age_7-12m, age_13-18m, age_19-24m, age_24m+ (baseline: age_0-1m)
- Term Dummies (5): term_6, term_12, term_24, term_36, term_60 (baseline: term_3)
- Delinquency History (1): ever_D30 (flag for prior 30+ DPD)
Top Positive Coefficients (increase delinquency risk):
- Low FICO (<620): +0.18 (subprime borrowers show higher delinquency risk)
- Program P3: +0.14 (subprime program shows elevated early delinquency)
- Prior Delinquency (ever_D30): +0.13 (history of 30+ DPD predicts future delinquency)
- Higher Loan Amounts: Larger loans (4k-8k+) show slightly higher early delinquency risk
Top Negative Coefficients (reduce delinquency risk):
- High FICO (740+): -0.78 (prime borrowers least likely to become delinquent)
- Mid-Age Loans (7-12m): -0.51 (seasoned loans past early payment shock period)
- Good FICO (700-739): -0.33 (near-prime borrowers show strong payment performance)
- Loan Age (4-6m, 2-3m): -0.48, -0.33 (loans that survive first month show lower delinquency risk)
Payoff Model (21 features - All Categorical)
Feature Categories:
- Program Dummies (2): program_P2, program_P3 (baseline: P1)
- Age Buckets (6): age_2-3m, age_4-6m, age_7-12m, age_13-18m, age_19-24m, age_25+ (baseline: 0-1m)
- FICO Buckets (4): fico_620-659, fico_660-699, fico_700-739, fico_740+ (baseline: <620)
- Term Dummies (5): term_12, term_24, term_36, term_48, term_60 (baseline: term_6)
- UPB Buckets (4): upb_0-1k, upb_2.5-5k, upb_5-7.5k, upb_7.5k+ (baseline: upb_1-2.5k)
Key Drivers of Early Payoff:
- Very Low Balance (upb_0-1k): Strongest predictor (+3.52 coefficient) - loans with balances under $1k are highly likely to be paid off
- Longer Terms (36m, 60m): Positive coefficients (+1.05, +0.82) - likely capturing maturity effect as longer-term loans approach their scheduled payoff
- Higher FICO (740+): +0.27 (prime borrowers more likely to refinance or pay off early)
Negative Drivers (reduce payoff probability):
- High Balance (upb_7.5k+): -0.86 (larger remaining balances are harder to pay off early)
- Mid-Range Balances (upb_5-7.5k, upb_2.5-5k): -0.78, -0.41 (all balance categories show lower payoff likelihood compared to the <$1k baseline)
- Loan Age (7-24m): Negative coefficients indicating lower payoff probability during mid-life of loan
Empirical Transition Matrices
Structure: 5 matrices (one per delinquency state) × 3 programs × 6 term buckets = 90 unique transition probability vectors
Key Roll Rates (aggregate across all programs/terms):
- D1-29 → Current (Cure): 27.0%
- D1-29 → D30-59 (Roll): 35.3%
- D1-29 → Stay D1-29: 35.3%
- D1-29 → Payoff: 4.5%
- D90-119 → Charge-off: 79.7%
- D120+ → Charge-off: 86.0%
A.5 Detailed Performance by Program and Term
Appendix Figure A1: Detailed model performance segmented by Program (P1, P2, P3) and Loan Term